Goto

Collaborating Authors

 Enterprise Applications


Online Learning in the Repeated Mediated Newsvendor Problem

Neural Information Processing Systems

Motivated by real-life supply chain management, we study a repeated newsvendor problem in which the learner is a mediator that facilitates trades between suppliers and retailers in a sequence of supplier/retailer interactions. At each time step, a new supplier and retailer join the mediator's platform with a private production cost and utility function, respectively, and the platform proposes a unitary trading price. The supplier accepts the proposed price if it meets or exceeds their unitary production cost and communicates their decision to the platform; simultaneously, the retailer decides the quantity to purchase at the proposed trading price based on their private utility function and sends their decision to the platform. If the supplier accepts the trading price, the transaction proceeds, and the retailer purchases their chosen quantity of units, paying the product of this quantity and the trading price to the supplier. The mediator's objective is to maximize social welfare. We design an online mediator's pricing strategy that features sharp regret rates under some natural assumptions, and we investigate the necessity of these assumptions, proving that relaxing any of them leads to unlearnability.


Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning

Neural Information Processing Systems

Machine learning is now ubiquitous in societal decision-making, for example in evaluating job candidates or loan applications, and it is increasingly important to take into account how classified agents will react to the learning algorithms. The majority of recent literature on strategic classification has focused on reducing and countering deceptive behaviors by the classified agents, but recent work of Attias et al. [5] identifies surprising properties of learnability when the agents genuinely improve in order to attain the desirable classification, such as smaller generalization error than standard PAC-learning. In this paper we characterize so-called learnability with improvements across multiple new axes. We introduce an asymmetric variant of minimally consistent concept classes and use it to provide an exact characterization of proper learning with improvements in the realizable setting. While prior work studies learnability only under general, arbitrary agent improvement regions, we give positive results for more natural Euclidean ball improvement sets. In particular, we characterize improper learning under a generative assumption on the data distribution. We further show how to learn in more challenging settings, achieving lower generalization error under well-studied bounded noise models and obtaining mistake bounds in realizable and agnostic online learning. We resolve open questions posed by Attias et al. [5] for both proper and improper learning.


Exploring the Noise Robustness of Online Conformal Prediction

Neural Information Processing Systems

Conformal prediction is an emerging technique for uncertainty quantification that constructs prediction sets guaranteed to contain the true label with a predefined probability. Recent work develops online conformal prediction methods that adaptively construct prediction sets to accommodate distribution shifts. However, existing algorithms typically assume perfect label accuracy which rarely holds in practice. In this work, we investigate the robustness of online conformal prediction under uniform label noise with a known noise rate. We show that label noise causes a persistent gap between the actual mis-coverage rate and the desired rate α, leading to either overestimated or underestimated coverage guarantees. To address this issue, we propose a novel loss function robust pinball loss, which provides an unbiased estimate of clean pinball loss without requiring ground-truth labels. Theoretically, we demonstrate that robust pinball loss enables online conformal prediction to eliminate the coverage gap under uniform label noise, achieving a convergence rate of O(T 1/2) for both empirical and expected coverage errors (i.e., absolute deviation of the empirical and expected mis-coverage rate from the target level α). This loss offers a general solution to the uniform label noise, and is complementary to existing online conformal prediction methods. Extensive experiments demonstrate that robust pinball loss enhances the noise robustness of various online conformal prediction methods by achieving a precise coverage guarantee and improved efficiency.


On the necessity of adaptive regularisation: Optimal anytime online learning on ℓp-balls

Neural Information Processing Systems

We study online convex optimisation on ℓp-balls in Rd for p > 2. While always sub-linear, the optimal regret exhibits a shift between the high-dimensional setting (d > T), when the dimension d is greater than the time horizon T and the low-dimensional setting (d T). We show that Follow-the-Regularised-Leader (FTRL) with time-varying regularisation which is adaptive to the dimension regime is anytime optimal for all dimension regimes. Motivated by this, we ask whether it is possible to obtain anytime optimality of FTRL with fixed non-adaptive regularisation. Our main result establishes that for separable regularisers, adaptivity in the regulariser is necessary, and that any fixed regulariser will be sub-optimal in one of the two dimension regimes. Finally, we provide lower bounds which rule out sublinear regret bounds for the linear bandit problem in sufficiently high-dimension for all ℓp-balls with p 1.


Explaining the Law of Supply and Demand via Online Learning

Neural Information Processing Systems

The law of supply and demand asserts that in a perfectly competitive market, the price of a good adjusts to a market clearing price. In a market clearing price p the number of sellers willing to sell the good at p equals the number of sellers willing to buy the good at price p . In this work, we provide a mathematical foundation on the law of supply and demand through the lens of online learning. Specifically, we demonstrate that if each seller employs a no-swap regret algorithm to set their individual selling price--aiming to maximize its individual revenue--the collective pricing dynamics converge to the market-clearing price p . Our findings offer a novel perspective on the law of supply and demand, framing it as the emergent outcome of an adaptive learning processes among sellers.


Stop managing your business finances in spreadsheets -- QuickBooks is 300 right now

PCWorld

When you purchase through links in our articles, we may earn a small commission. Invoicing, expense tracking, inventory, payroll hours, sales, and financial reports -- it's all in one place, updated in real time, without a single copy-paste. The project profitability tools are the real standout: see exactly which clients and jobs are making you money and which ones are quietly eating into your margins. That kind of visibility changes how you price, pitch, and plan. Existing data transfers in from Excel or older QuickBooks versions, setup walks you through step by step, and customer support is included if anything snags.


Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Neural Information Processing Systems

Graphical user interface (GUI) grounding, the ability to map natural language instructions to specific actions on graphical user interfaces, remains a critical bottleneck in computer use agent development. Current benchmarks oversimplify grounding tasks as short referring expressions, failing to capture the complexity of real-world interactions that require software commonsense, layout understanding, and fine-grained manipulation capabilities. To address these limitations, we introduce OSWORLD-G, a comprehensive benchmark comprising 564 finely annotated samples across diverse task types including text matching, element recognition, layout understanding, and precise manipulation. Additionally, we synthesize and release the largest computer use grounding dataset JEDI, which contains 4 million examples through multi-perspective decoupling of tasks. Our multi-scale models trained on JEDI demonstrate its effectiveness by outperforming existing approaches on ScreenSpot-v2, ScreenSpot-Pro, and our OSWORLD-G. Furthermore, we demonstrate that improved grounding with JEDI directly enhances agentic capabilities of general foundation models on complex computer tasks with state-of-the-art performance, improving from 23% to 51% on OSWorld. Through detailed ablation studies, we identify key factors contributing to grounding performance and verify that combining specialized data for different interface elements enables compositional generalization to novel interfaces.


Lifelong Test-Time Adaptation via Online Learning in Tracked Low-Dimensional Subspace

Neural Information Processing Systems

Test-time adaptation (TTA) aims to adapt a source model to a target domain using only test data. Existing methods predominantly rely on unsupervised entropy minimization or its variants, which suffer from degeneration, leading to trivial solutions with low-entropy but inaccurate predictions. In this work, we identify entropy-deceptive (ED) samples, instances where the model makes highly confident yet incorrect predictions, as the underlying cause of degeneration. Further, we reveal that the gradients of entropy minimization in TTA have an intrinsic lowdimensional structure, driven primarily by entropy-truthful (ET) samples whose gradients are highly correlated. In contrast, ED samples have scattered, less correlated gradients. Leveraging this observation, we show that the detrimental impact of ED samples can be suppressed by constraining model updates within the principal subspace of backward gradients. Building on this insight, we propose LCoTTA, a lifelong continual TTA method that tracks the principal subspace of gradients online and utilizes their projections onto this subspace for adaptation. Further, we provide theoretical analysis to show that the proposed subspace-based method can enhance the robustness against detrimental ED samples. Extensive experiments demonstrate that LCoTTA effectively overcomes degeneration and significantly outperforms existing methods in long-term continual adaptation scenarios.


Gradient-Variation Online Adaptivity for Accelerated Optimization with Hölder Smoothness

Neural Information Processing Systems

Smoothness is known to be crucial for acceleration in offline optimization, and for gradient-variation regret minimization in online learning. Interestingly, these two problems are actually closely connected -- accelerated optimization can be understood through the lens of gradient-variation online learning. In this paper, we investigate online learning with Hölder smooth functions, a general class encompassing both smooth and non-smooth (Lipschitz) functions, and explore its implications for offline optimization.


Optimal Gap-Dependent Regret for Private Stochastic Decision-Theoretic Online Learning

arXiv.org Machine Learning

We study stochastic decision-theoretic online learning with full information and event-level pure differential privacy. A COLT open problem of Hu and Mehta asks to determine the optimal gap-dependent regret rate for stochastic decision-theoretic online learning under pure event-level differential privacy. For $K$ actions, losses in $[0,1]$, and a unique best action separated from the second-best action by gap $Δ_{\min}$, the known lower bound is of order $ \frac{\log K}{\min\{Δ_{\min},\varepsilon\}}, $ or equivalently, up to universal constants, of order \[ \frac{\log K}{Δ_{\min}}+\frac{\log K}{\varepsilon}. \] We give a horizon-free pure-DP algorithm and prove the explicit regret bound \[ \operatorname{Reg}_T \le 1000 \cdot \left(\frac{\log K}{Δ_{\min}}+\frac{\log K}{\varepsilon}\right) \] for every horizon $T$. The numerical constant is not optimized. The algorithm partitions time into blocks of exponentially increasing size, plays a single action throughout each block, and chooses the next action by an exponential mechanism applied to a data-independent random prefix of the previous block. The random prefix converts block regret into a sum, over all prefix lengths, of softmax selection errors. A single entropy-potential argument controls all privacy-dominated large-gap actions at cost $\log K/\varepsilon$.